Assume we are given SNP matrix $G \in \mathbb{R}^{N,D}$. Standardizing it, will set the mean $\mu$ to zero and the variance to one for each SNP $j$.

The sample variance for SNP $j$ ($\text{var}_j$) is defined as: $$ \text{var}_j = \frac{1}{N} \sum_{i=1}^N (G_{ij} - \mu)^2 = \frac{1}{N} \sum_{i=1}^N G_{ij}^2 = 1 $$

Thus, when computing the sum of squared entries (as in the "new" normalization scheme), we get:

$$ ss = \sum_{i=1}^N \sum_{j=1}^D G_{ij}^2 = \sum_{j=1}^D N \cdot \text{var}_j = N \sum_{j=1}^D 1 = N \cdot D $$

Thus, normalizing $G$ by $\sqrt{\frac{ss}{N}}$ is equivalent to normalizing by $\sqrt{D}$ if $G$ was unit standardized.


In [28]:
import numpy as np
from pysnptools.standardizer.diag_K_to_N import DiagKtoN
from pysnptools.standardizer import Unit

N = 10
D = 100

np.random.seed(42)
m = np.random.random((N,D))

mu = Unit().standardize(m.copy())

# get factor
d2 = np.sum(mu**2) / float(N)

print "factor:", d2, "== D"
s = DiagKtoN(N)
s.standardize(m)
K = m.dot(m.T)
sum_diag = np.sum(np.diag(K))

print "sum of diagonal", sum_diag


factor: 100.0 == D
sum of diagonal 10.0

In [29]:
# this may not hold true for other standardizers (e.g. beta)...

import numpy as np
from pysnptools.standardizer import Beta

N = 10
D = 100

np.random.seed(42)
m = np.random.random((N,D))

mu = Beta().standardize(m.copy())

# get factor
d2 = np.sum(mu**2) / float(N)

print "factor: ", d2, "!= D"


factor:  0.0624957032658 != D

In [ ]: